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Abstract 
We present an identity that provides an unbiased estimate of a general statistical 
distribution. The identity computes the distribution density from dividing a histogram 
sum over a local window by a correction factor from a mean-force integral. We show 
that the mean force can be evaluated as a configuration average, and the optimal window 
size is roughly the inverse of the local mean- force fluctuation. The new identity offers a 
more robust and precise estimate than a previous work by Adib and Jarzynski [J. Chem. 
Phys. (122): 14114, 2005]. It also allows a straightforward generalization to an arbitrary 
ensemble and a joint distribution of multiple variables. Particularly we derive a mean- 
force enhanced version of the weighted histogram analysis method (WHAM). The 
method can be used to improve distributions computed from molecular simulations. We 
illustrate the use in computing a potential energy distribution, a volume distribution in a 
constant pressure ensemble, a radial distribution function and a joint distribution of 
amino acid backbone dihedral angles. 



I. Introduction 

In the study, we present a method for estimating a general statistical distribution 
from data collected in a molecular simulation. The method is superior to the common 
approach of using a normalized histogram, which suffers from either a large noise when 
the bin size is small or a systematic bias when the bin size is large. 

Our identity is akin to a previous one derived by Adib and Jarzynski ! (hence the 
AJ identity), whereby the distribution density p(x) at a point x is estimated from the 
number of visits to a window surrounding x, plus a correction from integrating the 
derivative of p(x) . The AJ identity improves over the histogram-based approach not 
only by eliminating the systematic bias from binning but also by smoothing out the 
resulting distribution, as the window contains much more data points than a single bin. 
However, the identity is slightly inconvenient as it neither ensures a positive output, nor 
determines its optimal parameters. 

Here we present a new identity in which we construct a proper correction factor 
and use it to divide the number of visits to a local window to reach an unbiased estimate, 
as schematically illustrated in Fig. 1(a). The new strategy not only guarantees a positive- 
definite distribution, but also offers a simple estimate of the optimal window size, as it 
separates the error contributions from both the histogram and the mean force. The new 
identity also allows straightforward extensions to an arbitrary ensemble and to a joint 
distribution of multiple variables. 

We describe the new identity in section II, present a few numerical applications in 
section III, and conclude the article in section IV with a few discussions. 



II. Methods 

II.A Integral identity 

We wish to find an expression for the distribution density p(x) at a point x = x" . 

To do so, we first approximate p{x*) by a histogram sum over a local window (x_,x + ) 
enclosing x = x* , and then apply a correction factor. Formally, we have 

p(x) dx 

, * x Jx_ 

P(x ) = — ; , 

\ ' p{x)l p{x*) dx 

where the numerator p(x) dx counts the fraction ofx falling into the window (x_,x + ) , 

and can thus be measured from the histogram sum over the window. 
Next, we convert the denominator to an integral as 

I p{x)l p(x ) dx = I Qxp[logp(x)-logp(x )]dx 
= £ + exp f £, (log p) \y) dy J Jx, 

where we have moved p(x*) into the integral in the first step, then express the logarithm 
of the ratio as an integral of the "mean force" (log/?)'^) . Hence, we have 

r x + 
p(x)dx 

***)=« Tjt S— • (^ 

I expl £. (log/?)'00</yj dx 

We refer to Eq. (I) as the fractional identity in the follows. Unlike in the AJ identity ', 
here the correction is applied as a divisor instead of additively. Nevertheless, it can be 
derived as a near-optimal modification of the AJ identity, as shown in Appendix A. 



II.B Mean force from direct averaging 

The identity Eq. (1) requires a mean force (logp)'(x) in addition to a histogram. 

In the follows, we construct a conjugate force f x = f x (r N , s) , as a function of molecular 

coordinates r N and other ensemble variables s , such that its ensemble average is equal 
to the required mean force. Thus, in a molecular simulation, we can compute f x for each 
trajectory frame and use its average as the mean force. 

We first express x as a function x = X(r N ,s) of both molecular coordinates r N 

and (optionally) some variables s of the simulation ensemble, e.g., s can be the volume 
in an isobaric ensemble, or the temperature in a tempering simulation 2 ' 3 . 
The distribution density p(x) can now be written as 

p(x) = J S(X(r N ,s)-x) w(r N , s) dr N ds , (2) 

where S(...) is the Dirac ^-function, and ^(r^ , s) is the ensemble weight for a 
configuration r* and parameters s,e.g., w(r N ,s) <x Qxp[-{3U(r N )] in case of a 
canonical ensemble (with U(r N ) being the potential energy, /? temperature). 
We evaluate its derivative as 

p\x) = f - — 5 (X(r N , s) - jc) w(r N , s) dr N ds, 
J dX 

where we have used dS (X - x)/dx - - dS (X - x)/8X . 

We proceed by introducing a vector field v = v(r N , s) that satisfies v • VX = 1 as 
a projection vector. For any vector u, the inner product v u gives the amplitude of the 
projection of u onto the gradient VX . A convenient choice of the projection vector v is 
v = VX l{yX ■ VX) (more generally it could be constructed from an arbitrary vector field 



Y as v = Y /(VX • Y) , as long as VX • Y ^ ). For later convenience, we have defined v 

and V on the joint vector space spanned by both the coordinates r N and parameters s . 

To use the projection vector, we insert v • VX = 1 into the integrand and recall that 
the 5 -function depends on coordinates r^ or parameters s only through X Thus, the 
product of VX and (d/dX) can be replaced by a single gradient operator V , and 

p'( x ) = J - v • VS(X(r N ,s)-x) w(r N , s) dr N ds 
= jS(X(r N ,s)-x)w(r N ,s) f x dr N ds, 

where we have integrated by parts, shifted the V to the rest of the integrand, and defined 
a conjugate force f x = V • v + v • V log w(r N , s) . Using Eq. (2), we find the mean force 
(logp)'(x) = p'{x)l p{x) is the average of f x 

(logp)'(x) = (f x ) x ={v-y + yVlogw(r N ,s)) x . (3) 

where (• • •) denotes an configuration average under a fixed x. Note, the above derivation 

is analogous to that of the dynamic temperature by Rugh 4 . For a canonical ensemble, the 
second term v • V log w is reduced to - (5 v • VU , i.e., the amplitude of the projection of 
the molecular force to the gradient ofX, which is in accordance with the meaning of the 
"mean force" of (f x 



II. C Optimal window size 

We now determine the two window boundaries x_ and x + in Eq. (1) such that 

they minimize the statistical error in p(x*). 



We first note that the histogram and mean-force data contribute independently to 
the numerator and denominator, respectively. How much the two contribute is however 
controlled by the window size. The output is dominated by the histogram contribution 
(numerator) with a narrow window, but by the mean-force contribution (denominator) 
with a wide window. For a narrow window, the denominator is reduced to the window 
width, and thus the identity is approximately a histogram average. At the other extreme, 
if the window covers the entire domain of x, the numerator becomes a constant, and the 
distribution is determined entirely by the mean-force integral on the denominator, i.e., 



p(x*) = exp 



j (log p)'(x)dx 



(the lower bound of the integral is determined by the 



normalization). 

As we increase the window size, the error of the numerator decreases as more 
data points reduces uncertainty, but that of the denominator increases as the error in the 
mean-force integral accumulates. The sum reaches a minimum at the optimal window. 

Quantitatively, the relative error of the numerator e(N)/N is l/\N , where 
N = N{x_,x + ) is the number of independent data points included in the window. 

The relative error of the denominator D is harder to compute exactly and thus is 
estimated from an upper bound. First, since D is an integral of exp(Alogyo) , the relative 
error of D must be less than the maximal relative error of exp(Alogp) , or equivalently, 
the maximal absolute error of A log p. Next, since A log/? itself is an integral of the 

mean force, i.e., Alog/7(x) = j ,(f x )(x') dx' , the maximum is likely to occur at either 
window boundary x - x ± . In the discrete version, the integral becomes a sum over bins 



as Alogp = ^. (/*)(*,•) Sx with^x being the bin size. Its error [^(Alog/?)] 2 , assuming 
no correlation among mean force at different bins, is ^o"/ (x t ) Sx 2 Sn{x t ) , where 
cr^(x.) is the variance of the conjugate force at x. , and Sn(x t ) is the number of 
independent data points in the bin at x t . 

On reaching the optimal window, including one more bin from either edge would 
keep the combined error {s(N)/N) +{e(D)/D) constant, i.e., 

Sn(x ± ) | ^/ 2 fc) jT 2_ 
N(x_,x + ) 2 Sn(x ± ) 

The first term represents the decrease of error due to increased sample size, while the 
second represents the increase due to the mean-force integration. Thus 

— — - = o>.(*±) Sx. 

N(x_,x + ) 

For a relatively a window, we have N ~ Snwl Sx, with w being the window 
width. If we further replacing a f (x ± ) by a local mean a f , then 

w = yJG f , (4) 

where y is a heuristic factor which should be 1.0 ideally. However, as we overly 
estimate the error of the denominator, Eq. (4) somehow underestimates the optimal 
window size. In practice, we found the optimal y was 1 ~ 2 . 

On using Eq. (4), we remind the reader that <j f is the mean-force fluctuation at a 
fixed x, i.e., al(x) = (f 2 ) -(f) , and thus is evaluated at the bin located at x. After the 



intra-bin calculation, the quantity can then be averaged over a local or global window to 
produce a f . 

If the mean-force fluctuation is very small, Eq. (4) suggests abandoning the 
histogram data and switching to a mean-force integration. On the other hand, if the 
mean-force has a very large variance, one should stick to the histogram. Thus the method 
is effective only if the mean-force fluctuation is small. An attempt to reduce the mean- 
force fluctuation is described in Appendix B, where we improve the mean force itself by 
using data of the second-order derivatives of the distribution. 

II.D Extension to weighted histogram analysis method 

We now extend Eq. (1) to a composite distribution, which is a combination of 
several distributions under different conditions. The composite distribution can result 
from different independent simulations or an extended ensemble simulation, such as a 
tempering simulation, i.e., simulated 2 and parallel tempering 3 . For concreteness, we 
shall assume the composite distribution result from canonical distributions under different 
temperatures /?, with i being its label. Hence, each individual distribution is p(x, /?,) . 

The aim is to estimate the distribution p(x, /?) at /? , which needs not to be one of 

the /?, 's. A standard routine for this is the multiple histogram method, also known as the 

weighted histogram analysis method (WHAM) 5 . Here, we derive an improved version 
whereby mean force information is utilized to improve the method. 
Similar to the derivation of Eq. (1), we start from 



p(x,P) 



Y Ji N i \y(x,P i )/p(x,P)dx 



where N t is the total number of independent data points from the simulation at the 
temperature /?, , and the sum is carried over different temperatures fi. . To proceed, we 
simultaneously multiple and divide p(x* ', ft) on the denominator, 

y n, r^Mdx = y Ni ^U^ r^M. dx , 



p(x ,/?) 



p(x ,/?) **- p(x ,fi) 



where the x-independent ratio p(x* , P i )/p(x*, fi) has been moved out of the integral, 



leaving the other ratio p(x, P f )/p(x ,j3 f ) to be converted the mean force integral at a 



fixed /?. as usual, 



p(x,P) 



^N^y^^dx 



£J N i p{ r : ^ f +ex p[r ( l °gpy(y> mm dx 



(5) 



For example, in case of the potential energy U distribution in a canonical 
ensemble, we have w(U,/3) = exp(-/?L/)/Z(/?) , with Z(/?) being the partition function, 



P(U*,J3) = 



^N^piU^JdU 



X, iV ( . exp[-(£ -/?)£/*)] _W f% xp J ((w^-fl)^ 



\ ' 



dU 



(6) 



Z(ft) ™- 

where v = VU/VU -VU . The regular WHAM 5 is recovered with an infinitesimal 
window U_=U + =U* . Generally, Eq. (5) improves the histogram method by using the 
mean force data, e.g., the dynamic temperature (V • v) , here. Note (V • v) , is not a 
function of the temperature ft , and since any two configurations with the same potential 
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energy U' always carry the same weight under any temperature (5 , the data for (V • v 
collected from different temperatures are identical and thus can be merged. 



III. Applications 

III.A Potential energy distribution 

We first compute a potential energy distribution p(U) in a canonical ensemble, in 

which w(r N ,/3) = Qxp[-/3U(r N )]/Z(/3) with Z(/?) being the partition function. 
According to Eq. (3), the mean force (log p)'{U) can be computed from averaging 
f v - V • v - /3 , where v = VU/(VU ■ VU) . Note the conjugate force f v has a clear 
physical meaning as the difference between the dynamic temperature 4 ' 6 (V • v) and the 
simulation parameter /? . 

We performed a molecular dynamics (MD) simulation on a 256-particle Lennard- 
Jones system under a smoothly switched potential (see Appendix C, r s = 2.0 and 

r c = 3.0 ). The temperature J3 - 1.0 , density p - 0.8 , time step At = 0.002 . Velocity 

rescaling was used as the thermostat 7 with a time step 0.01. 

The test sample consisted of 10 4 data points picked out from a trajectory of 10 7 
MD steps with an interval of every 1000 steps. All frames from the trajectory were used 
to construct a reference distribution for comparison. A histogram of bin size SU = 0.1 

was used in data collection and analyses. 

We first demonstrate the use of the fractional identity Eq. (1) with a fixed window 
size AU = 12.5 , a value determined from Eq. (4) (y - 1.0). The mean force (\ogp)'(U) 
was computed from a single bin at [/(in case of an empty bin, we symmetrically enlarged 
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it until the window contained at least one data point). As shown in Fig. 2(a), the resulting 
distribution was much smoother than the histogram. For comparison, we computed the 
result from the AJ identity with the same window size. Though the results are generally 
similar, the AJ identity sometimes yielded negative values at the two edges, while the 
fractional identity appeared to be more robust and closer to the reference. 

To show the gain from the integral identity approach, we define a KS difference 
(as commonly used in the Kolmogorov-Smirnov (KS) test 8 for detecting the difference 

between two distributions) as A KS = (<Jn + 0.11 + 0. 12/VW)A CDF , where N is the sample 

size, and A CDF is the maximal difference between the cumulative distribution function 

(CDF) F{x) = p(y)dy of the resulting distribution and that from the reference. The 

J— oO 

smaller the quantity is, the more accurate the test distribution is. As the measure is 
independent of the bin size, it mainly detects the systematic bias in the test distribution 
instead of the smoothness of distribution density. As the identity is not optimized for the 
CDF (but for the distribution density), the KS difference serves as a stringent test. 

The KS differences, computed for the histogram, the fractional identity Eq. (1) 
and the AJ identity Eq. (7) are shown in Fig. 2(b). It is clear that that both identities 
rendered more accurate distributions than the histogram. We also show there was an 
optimal window size that minimized the error. However, for the fractional identity, the 
optimal window size AU « 20.0 was greater than the value 12.5 given by Eq. (4). Thus, 

a factor y « 1 .5 was used in other examples. Recall the KS difference scales with ^N , 
thus from Fig. 2(b) we estimated about 20-fold increase of efficiency of using data 
around the optimal window. We also notice that with a smaller window, the fractional 
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identity gave better estimates than the AJ identity. This was expected as that the 
fractional identity is the optimal modification of the AJ identity in this case (Appendix 
A). With a larger window, the errors from both identities grew rapidly due to the larger 
involvement of the mean force data. As the factional identity quickly switched to a 
mean-force based integral with a large window, its growth was faster. The comparison 
shows that choosing the window size is crucial to the success of the integral identity, and 
an overly large window can be counterproductive. 

We performed a similar comparison in terms of the entropic distance defined as 

D(p\\p Tef ) = \ log[p(y)/p ref (y)]p(y)dy for two distributions p(x) and p ref (x) . For 

J— oo 

the AJ identity, in case p(x) < , zero was assumed. Unlike the KS difference, this 
quantity directly compares the distribution density. As shown in Fig. 2(c), the fractional 
identity consistently produced a small entropic distance than the AJ identity, suggesting 
an improved smoothness. Interestingly, the error of the entropic distance also has a 
minimal, which occurred at a similar location MJ « 20.0 to that from the KS difference. 

We now demonstrate the mean force improved weighted histogram analysis 
method (WHAM) introduced in Sec. II.D. In this case, we performed additional 
simulations at two neighboring temperatures T = 0.8 and T = 1.2 . The reweighted 
distributions to T = 1.0 from both the original WHAM and the improved version are 
shown in Fig. 2(d), and as expected, the latter is much smoother than the former. 

We further note that the identity approach is ensemble-dependent because the 
mean force depends on the ensemble weight w. To illustrate the point, we simulated the 
same system using a regular molecular dynamics without a canonical thermostat, i.e., we 
targeted a microcanonical ensemble in which the total energy was kept as a constant. In 
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the ensemble, the weight for a configuration, after averaging out momentum components, 
isw(r N )cc^K = ^E M -U(r N ) , where N f is the number of degrees of 
freedom, and K, U and E tot are the kinetic, potential and total energy, respectively. The 

\N f -\ 
conjugate force is accordingly f v = V • v — , where N f = 3N - 6 , and the 

-^tot _ ^( r ) 

constant reference temperature J3 in the canonical ensemble is changed to an energy- 
dependent term (\N f -l)/[£ tot -U(r N )]. 

The microcanonical-ensemble simulation was similar to the canonical-ensemble 
one. During equilibration, the kinetic energy was scaled regularly to match T « 1.0, and 
was kept as a constant afterwards (E tot = -932 ). As shown in Fig. 2(d), the distributions 
and mean forces (lower inset) from the two ensembles differed considerably, whereas the 
dynamic temperature (V • v) (upper inset) matched. This example shows the importance 
of applying the correct formula for the mean force. 

III.B Volume distribution 

In the second example, we compute a volume distribution p{V) in an isothermal- 
isobaric (i.e., constant temperature and pressure) ensemble 9 ' 10 . Unlike the previous case, 
the volume Fis not a function of system coordinates, but an additional variable in the 
ensemble weight w^r^, V) . Particularly, the volume V serves as a scaling factor that 

translates the reduced (0 to 1) coordinates R, to the actual ones as r. = v^R, . In terms 
of reduced coordinates, the ensemble weight can be written as 



14 



w({R.},F) dR N dV <x exp[-j3U({W'R i })-/3pV] V N dR N dV , where /? and/? are the 
reciprocal temperature and pressure, respectively. 

According to Eq. (3), the conjugate force f v is reduced to dlnw/dV in this case 
(the vector field v is the unit vector along the direction of the parameter V, and hence 

V-v = 0, \-V = d/dV). Thus, f v = N/V-fir-VU/(3V)-/3p = /3[p c (r N )-p], 

represents the difference between the instantaneous pressure ^(r") = (N + fir ■ F/3)/F 

and the simulation parameter/?. 

There is however a subtle distinction between the apparent volume distribution 
p(V) defined above and the actual physical one p(V) . The difference arises from the 
fact that the partition function Z(f3, V) in the canonical ensemble includes all 

configurations with a volume no larger than, instead of equal to, the volume V of the 
simulation box. The partition function for configurations with volume precisely being in 
(V - dV,V) is given by (8Z/8V)dV = {p^ y ZdV . Thus the actual volume distribution 

p(V) differs from p(V) by a factor, i.e., p{V) oc (p c ) v p(V) . The reader is referred to 

Koper and Reiss 10 for a lucid and more detailed discussion. 

A direct simulation according to p(V) is however inconvenient, as it requires one 

to know (p c ) v in advance. In the follows, we shall use p{V) to populate configurations 
during simulation, but report the adjusted p{V) after applying the correction. 

We performed an MD simulation on the 256 Lennard- Jones system using the 
switched potential (with r s =2.5 and r c =3.5). The temperature 7 = 1.24 and pressure 

p — 0.1 15 is around the critical point. Velocity rescaling was used as the thermostat 7 
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with a time step 0.01. For the pressure control, Monte Carlo volume moves were tried 
every two MD steps with a maximal magnitude of ± 2.0% of the side length of the box. 
The trajectory contained 10 7 steps with the time step dt = 0.002 . 

We constructed the test sample from picking one out of every 100 frames from 
the trajectory and collect them using a histogram of bin width SV = 1 .0 . Data from the 
entire trajectories was used to produce reference curves. Since the volume changed 
almost by an order of magnitude, and the mean force fluctuation <j f cc \/V , a fixed 

window size was not suitable. Thus we applied Eq. (4) with a f estimated from a local 

window, and y = 1.5 (heuristic value). To apply the correction, a smooth (p c ) v was 

computed from a second integral identity, Eqs. (10) and (1 1), in Appendix B. The 
window size was similarly determined from the local <j f but with y = 3.0 . 

As shown in Fig. 3, the volume distribution p(V) from the fractional identity was 
smoother than the histogram although it still had some ruggedness. From the inset, we 
observe that the window size grows linearly with the volume V. This example also 
clearly illustrates the danger of using an overly large window. We show in Fig. 3 the 



distribution from a pure mean-force integration p(V) = exp 



f '(log p)'(V')dV' 



(which 



is the limiting case of using an infinite window, also corrected to the corresponding p(V) 
in the figure) manifested a much larger deviation from the reference (the deviation was 
however not systematic, as it diminished with the sample size). Thus choosing a proper 
window is crucial to the success of the method. 
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III.C Radial distribution function 

In the third application, we compute a radial distribution function g(r) . We note 
that g(r) is normalized in such a way that g(r) — > 1 at a large distance r, and thus it 
relates to the actual distribution density p(r) as p(r) = g(r)A7tr 2 /V m , where V m = nV /6 
is the volume of the sphere that fills the assumingly cubic simulation box, and the factor 
47rr 2 dr/V m gives the total probability of finding another particle in a sphere (radius r and 
thickness dr) centered at the test particle, if all particles were non-interacting. Thus, we 

modify Eq. (1) to g(r*) = \ + p(r) dr / j\4m- 2 /V m ) exp[j\ (log g)'(y)dy] dr. 

In a canonical ensemble, where w(r) oc exp(-/?(7) , the conjugate force of g(r n ) 
for a pair of particles 1 and 2 is f r = |/?F 12 -f 12 , where /? is the reciprocal temperature, 
F 12 =Fj — F 2 is the difference between the force exerted on particles 1 and 2, and r 12 is 
the unit displacement vector from particle 2 to particle 1 . To derive the result, we apply 
Eq. (3) with a vector field v. =jT u (S u -S 2i ), and note that (/ r ) here corresponds to 
(logg)'(>) instead of (log p)'{r) , and thus the divergence V • v term is exactly cancelled 
by the derivative from the additional normalization factor, as -^XogAw 2 /V m = V • v = 2/r . 

We simulated the 256 Lennard- Jones system under two different temperatures 
T x = 0.85 and T 2 = 0.4 , using the switched potential (r s - 2.5 and r c =3.5 ). The density 

p = 0.7 in both cases. Velocity rescaling with time step 0.01 was used as the thermostat 
1 . After 5><10 steps of equilibration, we simulated another 5 x 1 4 steps with a time step 
dt = 0.002 . From the entire trajectory, we picked 5 frames (from every 10 4 steps) as the 
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test sample, and 500 frames (every 10 steps) as the reference. The bin size 8r = 0.002 
was used in data collection and analyses. 

Since this example was also used in the original Adib-Jarzynski paper , it is 
instructive to make a direct comparison between the two identities. As shown in Figure 
4, the fractional identity produced smooth distributions with good agreement with the 
respective references in both temperatures, despite a relatively small sample size. On the 
other hand, although the Adib-Jarzynski identity (Eq. (20) in the reference with R x =1.0) 
1 also produced smooth distributions, there were appreciable deviations (especially at the 
lower temperature T 2 = 0.4) from the reference. The deviations were again not 
systematic, as they diminished with the sample size. We note the large deviations from 
the AJ identity were similar to those observed in the previous example of the volume 
distribution when a mean force integration was used to produce the distribution. Thus 
they were likely due to an overly large window, as the entire range of r, from to half of 
the box size, was used as the window by the AJ identity. The error was larger at a lower 
temperature because the mean force changed more drastically there (hence larger mean 
force fluctuation). By contrast, in the fractional identity case, smaller windows, 
Ar = 0.14 for T x = 0.85 and Ar = 0.09 for T 2 = 0.4 were used according to Eq. (4) 
( y = 1 .5 ), and thus the output was more robust. We also note that the optimal window 
size shrunk at the lower temperature. This trend is universal. Since f x has a component 
/?v • F in the canonical ensemble, as one increases (3 (or lowers the temperature), the 

fluctuation grows, and thus the window size shrinks. The example again illustrates the 
critical influence from the window size. 
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III.D Amino acid backbone dihedral angles 

Finally, we compute a joint distribution p(q>, y/) of the two backbone dihedrals q> 
and y/oi a glycine dipeptide. Here q> and yr are the C'-N-C„-C andN-C„-C-N' dihedral 
angles respectively. To apply the identity, it is necessary to generalize Eq. (1) to the two 
dimensional case 



, * * >. Jw_ J(h_ 

Pi<P ,V ) = - 



T + f + exp[log/?(^, ^)-log/?(^*, y/*)]d0dy/ 

Jy/_ <J<p_ 

On the denominator, \ogp(q>, y/) can still be computed from the two mean force 
components -jj- log p{q>, y/) and -^- log p{q>, y/) , but not via a direct integration, as the 

calculation is an overdetermined one (i.e. one distribution but two derivatives). Instead, 
we constructed log p(q>, y/) in such a way that its two partial derivatives matched the 
observed values with a minimal overall deviation, see Appendix D. 



The mean force are computed as averages as -^\og p(<p, y/) = (/?u • F) and 



s 

ci// 



log p{cp, y/) = ( K /3\-¥} , in which (5 is the temperature, F is the force, and the two 

vector fields u = V <p/(V (p ■ V<p) , v = V y//(V y/ -Vy/) . A caveat was that in computing 
u and v, we only included the gradient component from the 1,4 atoms, i.e., atoms C and 
C for V cp , atoms N and N' for V y/ . Thus the mean force formulas (as shown above) 
were free from both the cross correlation between the two mean forces and the two 
divergences V • u and V • v . 

We dissolved the glycine dipeptide in a 32x32x32 A 3 TIP3P n water box, and 
simulated 36 ns with a time step 1 femtosecond. All chemical bonds of the peptide were 
allowed to vibrate. A double precision GROMACS 4.5 12 was used as the simulating 
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engine, The velocity rescaling method , SETTLE , and particle meshed Ewald (PME) 
sum l were used for thermostat, constraints in water molecules and long range 
electrostatic interaction, respectively. Non-bonded interactions were cutoff at 7 A and 
shifted to zero until 8 A. The PME grid spacing was 12 A. Dihedral data were collected 
every step using a l°xl° bin. 

Due to a relatively large mean force fluctuation, we were only able to use small 
4°x4° windows for the fractional identity, according to Eq. (4) (however, in case the 
window was empty, we expanded the window symmetrically until it included at least one 
data point). In Fig. 5, we show that the distribution p{(p,y/) from the fractional identity 
improved the smoothness over that from the normalized histogram. Particularly, the 
barrier regions, e.g. q> ~ 0° , y/ « ±100° , were enhanced. Additionally the forbidden band 
at <p « ±180° , where the histogram was simply missing, was now filled by small but 

finite numbers. On the other hand, peaks at the helical and extended conformations were 
well preserved. 

IV. Conclusions and discussions 

In conclusion, we presented an identity, Eqs. (1) and (3), for estimating a general 
statistical distribution from data collected in molecular simulations. Compared with the 
previous identity-based approach ' 5 , the new identity offers a more general applicability 
(to any variable x and any ensemble, easily extensible to higher dimensional distributions, 
etc.), and at the same time a more robust and precise output. 

The general expression for the conjugate force f x Eq. (3) is also simpler than the 

conventional one from - /? dU/dx = -/3VU ■ dr N /dx 15 in that it avoids a usually complex 
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coordinate transformation in computing dr N fdx by replacing it with a simple force 
projection /?v F plus a divergence V • v . Thus, it is applicable, at least in principle, to 
an arbitrary x = X(r N ). Though the computation of the divergence adds certain 
computational complexity, it can often be simplified or even avoided with a careful 
choice of the vector field v (as in the dihedral example). 

We also showed that the window size should be carefully chosen to maximize the 
benefit from the identity. An overly wide window risks a large error from the mean-force 
integration, although it usually yields a smoother distribution. 

Finally, we wish to distinguish the method from an explicit smoothing method 16 . 
The current method does not assume a smooth distribution, but only seeks an exact 
expression for the distribution density with a hopefully minimal error. Although the use 
of the mean force (log/?)' implies a differentiable distribution density p(x), the mean 
force itself can be as oscillatory as an apparent noise. It is therefore possible that with 
reasonable approximations, the method can be further improved by introducing elements 
of the explicit smoothing methods. 
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Appendix A. Alternative derivation of the fractional identity 

We first rephrase the work of Adib and Jarzynski as the follows. The presentation 
is given with respect to a general distribution, and thus formally differs from the original 
one l , which focused on a radial distribution. Nevertheless, the basic features are similar. 

Firstly, any function p(x) at x - x can be evaluated as an integral over a 

window (x_, x*) as 

p(x*) = p(x") </>(x*) -p(x_) 0(X_) 

= \ X [p(x)0'(x) + p'(x)0(x)]dx, 

where </>{x) is an arbitrary function under two conditions, </>{x_) = and <fi(x*) = 1 ; on 
the second line, the difference is converted to an integral within the range. 

The range of integration can be enlarged to a window (x_, x + ) that encloses x by 

applying the above equation to two windows (x_, x") and {x* ,x + ) , (x_ < x < x+) with 
different </>(x) 's, and then linearly combining them, see Fig. 1(b), 

P(x*)= r[p(x)0'(x) + p'(x)0(x)]dx, (7) 

Jx_ 

where the combined <p(x) satisfies 

U(x + ) = 0(x_) = O, 

V(x*-)-^(x* + ) = l, 

with 0(x*~) and <fi(x* + ) , the value of </>{x) immediately at the left and right of x* 
respectively, serving as coefficients of combining the two windows. The function <p(x) 
is equivalent to the vector field u(r) in the original paper ' . 
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The problem of Eq. (7) is that it does not always yield a positive output since the 
correction in p'(x) is free to overthrow the histogram contribution by random error. 

We now derive the fractional identity Eq. (1) from Eq. (7). We start from a 
simple observation: if f(x) is a function that equals to unity at x = x* , i.e., f(x*) = 1 , 
then Eq. (7) applies not only to p{x) itself, but also to the product p{x)f{x) , i.e., 

P(x) = f + [p(x)f(x)] <j>'(x)dx + f + [p(x)f(x)]'0(x)dx . 

An arbitrary f{x) , or equivalently a reference distribution ' 5 , does not guarantee 
a non-negative output. However, if we choose f{x) such that p(x)f{x) a constant, the 
second term on the right hand side of the above equation vanishes, and 

p(x*)=t[p(x)f(xM(x)dx. (9) 

Jx_ 

Thus p{x*) is nonnegative as long as <p'(x) is so. The function f(x) is obtained by 
integrating the distribution mean force from the boundary f(x*) = 1 as 



/0) = exp(-[log/?0)-log/?(x )] 
= exp I -£,Qog p)'(y)dy 



It is easily verified that -^iogf(x) = --^logp(x) and hence -£[p(x)f(x)] = . 

Finally, we determine <ft(x) based on the observation that <fi'(x) acts as a weight 

in Eq. (9). Thus, to minimize statistical error, <fi'(x) should be inversely proportional to 

the variance of p(x)f(x) 9 . For a small window, we assume that the error of p(x)f(x) 

comes mainly from the number of visits p(x) instead of the modulation factor f(x) . 

Additionally, we assume the variance of p(x) is proportional to p(x), i.e., a Poisson 

distribution 9 , we thus have Yar[p(x)f(x)] = Vav[p(x)] f 2 (x)x p(x) f 2 (x)°c f(x) . 

24 



<l>'{x) can now be written as Cj fix) , where the constant C is determined from Eqs. (8) as 
1 = <j)(x )- <j>(x_ ) + <j>(x + ) - <j)(x* + ) = [ V(X) dx (the singularity at x = x* is ignored). 

Jx_ 

Solving the equation gives C-\\ exp J , (log p)'(y) dy\dx, and Eq. (1) is recovered. 



Appendix B. Improving the mean force 

As suggested by Eq. (4), the optimal window size, hence the precision of the 
output from the integral identity Eq. (1), increases with the reduction of the mean force 
error. Hence, it is advantageous if one can improve the mean force (log p)'(y) itself. If 
the second-order derivative (logp)"(x) is available, one can apply an Adib-Jarzynski-like 
identity (see Appendix A): 

(log />)'(*„) = f [cp'(y) (iogp)'(y) + cp(y) (log p)"(y)] dy , (10) 

Jx_ 

where <p(x) satisfies (p{x_) = (p{x + ) = , <p{x Q -8)- <p{x + S) - 1 . In practice, we use a 
linear function <p{x) = {x- x b )/(x + - x_) , with x h = x_ if x < x or x h = x + otherwise. 
Note the window (x_,x + ) can be different from that in Eq. (1). An averaging expression 
for computing (logp)"(x) can be found by taking the derivative of Eq. (3) 

(logp)"(x) = ((AA) 2 ) x + (v-V/ x ) x . (11) 

Eqs. (10) and (1 1) are particularly useful in computing the volume distribution, in 
which a smooth multiplicative correction term f3(p c (V)) can be obtained from the mean 



force (logp)'(V) = /3 ((p c (V)) v - p) using the above method. 
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Appendix C. Switched Lennard-Jones potential 

7 

The Lennard-Jones potential is switched at r = r s to a polynomial ^a k (r-r c ) k 



k=4 



and extended to zero at r = r c . For simplicity, we shall assume the reduced unit, where 

both the energy s and radius a units are 1.0. In our application in energy histograms, 
we need the potential and the first three derivatives to be continuous, since the derivative 
of the dynamic temperature requires up to the third-order derivative of the potential. The 
continuity at r = r c is guaranteed by the first four vanishing coefficients. To ensure the 

continuity up to third-order derivatives at r = r s , the following parameters are used 

a 4 =4(35 r^A + 90 r 2 ArB + \5r s Ar 2 C + 2SAr 3 D)/(Ar 4 r s 15 ), 
a 5 =-24(14 r'A + 39 r 2 Ar B + 1 r s Ar 2 C + \4 Ar 3 D) l(Ar 5 r s 15 ), 

.3 A , OA/1 -.2 A.. D , 1(1.. A..2/-T , a A A. .3 r>\ //A. .6 ..15 



a 6 = 4 (70 r;A + 204 r^ArB + 39 r s Ar z C + S4 Ar i D)l(Ar b r s n ), 
a 7 =-\6(5r?A + \5 r 2 ArB + 3 r s Ar 2 C + TAr 3 D)/(Ar 7 r s 15 ), 



where, Ar = r c -r s , A = l-r? ,B = 2-r;, C = 26-7r s 6 , Z) = 13-2r; 



Appendix D. Potential from the two-dimensional mean force 

Unlike the one-dimensional case, determining a two-dimensional "potential" 
u(x, y) = log/? from the two mean force components f x = du/dx and / = du/dy is an 

overdetermined problem, as the number of variables is only half of the number of the 
equations. Hence we seek a "best fit" solution in the follows. 

We assume the potential is set up on a two-dimensional grid of vV x M with a cell 
(bin) size Sx x Sy . For a cell at (n,m), where n - 1,2, . . . N , m - 1,2, . . . M , we wish to 

minimize the difference between the mean force value from f n m and that from discretely 
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differentiating the u values at four cell corners (u n+lm +u n+lm+l -u nm -u nm+1 )/(2Sx) (and 
similarly for g n m ). To the end, we minimize the following action 5*, 



s = -Y 

2^ 



r \ 2 

n+\,m n+Lm+1 n.m n,m+\ r o 

— ■ —z fn,« & 



j 



+ 



U n,m+1 "*" M «+l,m+l U n,m U n+\,m 



o n ,m y 



At the minimum, we must have 8S/du n m = for every u n m , i.e. 



U n-l,m-l + U n-l,m+l + U n+l,m-l +U n+l,m+l 



J n— l,m— 1 J n— \jn J n,m—l J n,m r* , on-l,ffl-l & n,m-\ &n-\,m & n,tn r* 



The set of linear equations can be solved using Fourier transform. With 
,, =^ u nm exp[-2fti (kn / N + 1m / M)] (f kl and g kl similarly defined), we have 



zexp 



m 



N M 



sin 



\N j 



cos 



^n 



yMj 



f kl Sx + cos 



'ffio 



K N j 



sin 






&kj 5 y 



l-cos 



f 2nk ^ 
V A^ j 



cos 






A final inverse Fourier transform yields the desired potential u n m in the real space. 
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Figure Captions 

Fig. 1 (a) The key of the fractional identity is to convert the ratio of a histogram sum 
(shaded area) to the distribution density p(x*) to an integral of the mean force. We then 
use the ratio to divide the observed histogram sum to obtain an unbiased estimate of 
p(x*) . As both the histogram sum and ratio involves data from a window instead of a 
single bin, the resulting distribution is smoother due to the reduced uncertainty, (b) The 
auxiliary function <j){x) (solid) and its derivative <fi'(x) (dashed, ^-function at x = x* 
excluded) employed by the Adib-Jarzynski identity (Appendix A). 

Fig. 2 The potential energy distribution, (a) Comparison of the distributions from the 
histogram (gray), the fractional identity (Eq. (1), blue); and the Adib-Jarzynski (AJ) like 
identity (Eq. (7), red), (b) Error measured from the KS difference as a function of 
window size AU . (c) Error measured from the entropic distance, (d) The distribution 
from the weighted histogram analysis method (WHAM, red), compared with an 
improved version using the fractional identity (Eq. (6), blue), (e) Comparison of the 
distributions from a canonical ensemble (blue) and a microcanonical one (red). 

Fig. 3 The volume distribution p(V) (p = 0.1 15 and T = 1.24 , correction applied). 
Gray: the histogram; blue: the fractional identity; red: pure mean-force integration 
(limiting case with an infinite window). Inset: the window size. 

Fig. 4 The radial distribution function g{r) . (a) T x = 0.85 ; (b) T 2 = 0.4 . Gray: the 
histogram; blue: the fractional identity; red: the original Adib-Jarzynski (AJ) identity. 
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Fig. 5 The joint distribution of the two backbone dihedrals in the glycine dipeptide. (a) 
The histogram; (b) the fractional identity. 
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